766 research outputs found

    Displayed Categories

    Get PDF
    We introduce and develop the notion of *displayed categories*. A displayed category over a category C is equivalent to "a category D and functor F : D --> C", but instead of having a single collection of "objects of D" with a map to the objects of C, the objects are given as a family indexed by objects of C, and similarly for the morphisms. This encapsulates a common way of building categories in practice, by starting with an existing category and adding extra data/properties to the objects and morphisms. The interest of this seemingly trivial reformulation is that various properties of functors are more naturally defined as properties of the corresponding displayed categories. Grothendieck fibrations, for example, when defined as certain functors, use equality on objects in their definition. When defined instead as certain displayed categories, no reference to equality on objects is required. Moreover, almost all examples of fibrations in nature are, in fact, categories whose standard construction can be seen as going via displayed categories. We therefore propose displayed categories as a basis for the development of fibrations in the type-theoretic setting, and similarly for various other notions whose classical definitions involve equality on objects. Besides giving a conceptual clarification of such issues, displayed categories also provide a powerful tool in computer formalisation, unifying and abstracting common constructions and proof techniques of category theory, and enabling modular reasoning about categories of multi-component structures. As such, most of the material of this article has been formalised in Coq over the UniMath library, with the aim of providing a practical library for use in further developments.Comment: v3: Revised and slightly expanded for publication in LMCS. Theorem numbering change

    Analysis of spatial and temporal dynamics of xylem refilling in Acer rubrum L. using magnetic resonance imaging.

    Get PDF
    We report results of an analysis of embolism formation and subsequent refilling observed in stems of Acer rubrum L. using magnetic resonance imaging (MRI). MRI is one of the very few techniques that can provide direct non-destructive observations of the water content within opaque biological materials at a micrometer resolution. Thus, it has been used to determine temporal dynamics and water distributions within xylem tissue. In this study, we found good agreement between MRI measures of pixel brightness to assess xylem liquid water content and the percent loss in hydraulic conductivity (PLC) in response to water stress (P50 values of 2.51 and 2.70 for MRI and PLC, respectively). These data provide strong support that pixel brightness is well correlated to PLC and can be used as a proxy of PLC even when single vessels cannot be resolved on the image. Pressure induced embolism in moderately stressed plants resulted in initial drop of pixel brightness. This drop was followed by brightness gain over 100 min following pressure application suggesting that plants can restore water content in stem after induced embolism. This recovery was limited only to current-year wood ring; older wood did not show signs of recovery within the length of experiment (16 h). In vivo MRI observations of the xylem of moderately stressed (~-0.5 MPa) A. rubrum stems revealed evidence of a spontaneous embolism formation followed by rapid refilling (~30 min). Spontaneous (not induced) embolism formation was observed only once, despite over 60 h of continuous MRI observations made on several plants. Thus this observation provide evidence for the presence of naturally occurring embolism-refilling cycle in A. rubrum, but it is impossible to infer any conclusions in relation to its frequency in nature

    Sparse Tensor Transpositions

    Full text link
    We present a new algorithm for transposing sparse tensors called Quesadilla. The algorithm converts the sparse tensor data structure to a list of coordinates and sorts it with a fast multi-pass radix algorithm that exploits knowledge of the requested transposition and the tensors input partial coordinate ordering to provably minimize the number of parallel partial sorting passes. We evaluate both a serial and a parallel implementation of Quesadilla on a set of 19 tensors from the FROSTT collection, a set of tensors taken from scientific and data analytic applications. We compare Quesadilla and a generalization, Top-2-sadilla to several state of the art approaches, including the tensor transposition routine used in the SPLATT tensor factorization library. In serial tests, Quesadilla was the best strategy for 60% of all tensor and transposition combinations and improved over SPLATT by at least 19% in half of the combinations. In parallel tests, at least one of Quesadilla or Top-2-sadilla was the best strategy for 52% of all tensor and transposition combinations.Comment: This work will be the subject of a brief announcement at the 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '20

    Displayed Categories

    Get PDF
    We introduce and develop the notion of displayed categories. A displayed category over a category C is equivalent to "a category D and functor F : D -> C", but instead of having a single collection of "objects of D" with a map to the objects of C, the objects are given as a family indexed by objects of C, and similarly for the morphisms. This encapsulates a common way of building categories in practice, by starting with an existing category and adding extra data/properties to the objects and morphisms. The interest of this seemingly trivial reformulation is that various properties of functors are more naturally defined as properties of the corresponding displayed categories. Grothendieck fibrations, for example, when defined as certain functors, use equality on objects in their definition. When defined instead as certain displayed categories, no reference to equality on objects is required. Moreover, almost all examples of fibrations in nature are, in fact, categories whose standard construction can be seen as going via displayed categories. We therefore propose displayed categories as a basis for the development of fibrations in the type-theoretic setting, and similarly for various other notions whose classical definitions involve equality on objects. Besides giving a conceptual clarification of such issues, displayed categories also provide a powerful tool in computer formalisation, unifying and abstracting common constructions and proof techniques of category theory, and enabling modular reasoning about categories of multi-component structures. As such, most of the material of this article has been formalised in Coq over the UniMath library, with the aim of providing a practical library for use in further developments

    On Optimal Partitioning For Sparse Matrices In Variable Block Row Format

    Full text link
    The Variable Block Row (VBR) format is an influential blocked sparse matrix format designed to represent shared sparsity structure between adjacent rows and columns. VBR consists of groups of adjacent rows and columns, storing the resulting blocks that contain nonzeros in a dense format. This reduces the memory footprint and enables optimizations such as register blocking and instruction-level parallelism. Existing approaches use heuristics to determine which rows and columns should be grouped together. We adapt and optimize a dynamic programming algorithm for sequential hypergraph partitioning to produce a linear time algorithm which can determine the optimal partition of rows under an expressive cost model, assuming the column partition remains fixed. Furthermore, we show that the problem of determining an optimal partition for the rows and columns simultaneously is NP-Hard under a simple linear cost model. To evaluate our algorithm empirically against existing heuristics, we introduce the 1D-VBR format, a specialization of VBR format where columns are left ungrouped. We evaluate our algorithms on all 1626 real-valued matrices in the SuiteSparse Matrix Collection. When asked to minimize an empirically derived cost model for a sparse matrix-vector multiplication kernel, our algorithm produced partitions whose 1D-VBR realizations achieve a speedup of at least 1.18 over an unblocked kernel on 25% of the matrices, and a speedup of at least 1.59 on 12.5% of the matrices. The 1D-VBR representation produced by our algorithm had faster SpMVs than the 1D-VBR representations produced by any existing heuristics on 87.8% of the test matrices

    An Efficient Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats

    Get PDF
    Tensors, linear-algebraic extensions of matrices in arbitrary dimensions, have numerous applications in computer science and computational science. Many tensors are sparse, containing more than 90% zero entries. Efficient algorithms can leverage sparsity to do less work, but the irregular locations of the nonzero entries pose challenges to performance engineers. Many tensor operations such as tensor-vector multiplications can be sped up substantially by breaking the tensor into equally sized blocks (only storing blocks which contain nonzeros) and performing operations in each block using carefully tuned code. However, selecting the best block size is computationally challenging. Previously, Vuduc et al. defined the fill of a sparse tensor to be the number of stored entries in the blocked format divided by the number of nonzero entries, and showed that the fill can be used as an effective heuristic to choose a good block size. However, they gave no accuracy bounds for their method for estimating the fill, and it is vulnerable to adversarial examples. In this paper, we present a sampling-based method for finding a (1 + epsilon)-approximation to the fill of an order N tensor for all block sizes less than B, with probability at least 1 - delta, using O(B^(2N) log(B^N / delta) / epsilon^2) samples for each block size. We introduce an efficient routine to sample for all B^N block sizes at once in O(N B^N) time. We extend our concentration bounds to a more efficient bound based on sampling without replacement, using the recent Hoeffding-Serfling inequality. We then implement our algorithm and compare our scheme to that of Vuduc, as implemented in the Optimized Sparse Kernel Interface (OSKI) library. We find that our algorithm provides faster estimates of the fill at all accuracy levels, providing evidence that this is both a theoretical and practical improvement. Our code is available under the BSD 3-clause license at https://github.com/peterahrens/FillEstimation

    Shock compression of single-crystal forsterite

    Get PDF
    Dynamic compression results are reported for single-crystal forsterite loaded along the orthorhombic a and c axes to pressures from 130 to 165 GPa. Hugoniot states for the two axes are well described by a single curve offset to densities 0.15–0.20 g/cm^3 lower than earlier data for single-crystal forsterite shocked along the b axis above 100 GPa. Earlier data of Syono et al. [1981a] show marginal support for similar b-axis behavior in the mixed-phase region from 50 to 92 GPa. Thus shocked forsterite is most compressible in the b direction for the mixed-phase and high-pressure regimes (P > 50 GPa). These data represent the highest pressures for which shock properties have been observed to depend on crystal orientation. Theoretical Hugoniots for mixed-oxide and perovskite-structure high-pressure assemblages of forsterite calculated from recent experimental data are virtually identical and agree with the b-axis data. The a- and c-axis data are also consistent with both high-pressure assemblages because uncertainties in equation of state parameters produce a broad range of computed Hugoniots. Our calculated “average” Hugoniot is up to 0.13 g/cm^3 less dense than the preferred theoretical Hugoniots, in agreement with earlier measurements on dense polycrystalline forsterite. Interpolation between the single-crystal forsterite Hugoniots and Hugoniots for fayalite and Fo_(45) gives Fo_(88) Hugoniots bracketing Twin Sisters dunite data not previously well fit by systematics. Release paths are steep for the a and b axes but c-axis release paths are much shallower. Hugoniot elastic limits measured for the a and b axes are in good agreement with previous data of Syono et al.; however, the present data for the a axis reveal a triple wave structure: two deformational shock waves as well as the elastic shock, a feature not previously found. The second shock, with amplitude about 9 GPa and a shock temperature of about 350°K, could perhaps be explained by the forsterite α→β or γ phase transformation

    Shock wave equations of state using mixed-phase regime data

    Get PDF
    A method is given that uses Hugoniot data in the mixed-phase regime to constrain further equation of state (EOS) parameters of low- and high-pressure phases of materials under-going phase transformations on shock loading. We compute the relative proportion of low- and high-pressure phases present in the mixed-phase region and apply additional tests to the EOS parameters of the separate low- and high-pressure phases by invoking two simple requirements: the fraction of high-pressure phase (1) must increase with increasing shock pressure, and (2) must approach one at the high-pressure end of the mixed-phase regime. We apply our analysis to previously published data for potassium thioferrite, KfeS_2, and pyrrhotite, Fe_(0.9)S. We find that including the mixed-phase regime data in the KfeS_2 analysis requires no change in the published high-pressure EOS parameters. For Fe_(0.9)S we must modify the high-pressure phase EOS parameters to account for both the mixed-phase and high-pressure phase Hugoniot data. Our values of zero-pressure density, bulk modulus and first pressure derivative of the bulk modulus of the high-pressure phase of Fe_(0.9)S are 5.3 Mg/m^3, 106 GPa, and 4.9, respectively

    LATE Ain'T Earley: A Faster Parallel Earley Parser

    Full text link
    We present the LATE algorithm, an asynchronous variant of the Earley algorithm for parsing context-free grammars. The Earley algorithm is naturally task-based, but is difficult to parallelize because of dependencies between the tasks. We present the LATE algorithm, which uses additional data structures to maintain information about the state of the parse so that work items may be processed in any order. This property allows the LATE algorithm to be sped up using task parallelism. We show that the LATE algorithm can achieve a 120x speedup over the Earley algorithm on a natural language task
    • …
    corecore